Estimating the effectiveness of restrictions on reducing the spread of Coronavirus

Using the infection estimates from healthdata.org we compute fractional change metrics as an estimate of the new infection rate plus the recovery and death rates. Under the assumption that the studied countermeasures do not effect recovery and death rates, we construct a Bayesian model to estimate the effectiveness of each restriction cataloged in the healthdata.org dataset in reducing the infection rate.

The data is stored in two CSV files. The primary file contians dated estimates for each included location of values such as total active infections, burden on ICUs, etc. The second contains aggregate statistics and start and end dates for each of the countermeasures.

In [3]:
healthdata = pd.read_csv('2020_05_28/Hospitalization_all_locs.csv', parse_dates=['date'])

stats = pd.read_csv('2020_05_28/Summary_stats_all_locs.csv', parse_dates=[
    'peak_bed_day_mean', 'peak_bed_day_lower',
    'peak_bed_day_upper', 'peak_icu_bed_day_mean', 'peak_icu_bed_day_lower',
    'peak_icu_bed_day_upper', 'peak_vent_day_mean', 'peak_vent_day_lower',
    'peak_vent_day_upper', 'travel_limit_start_date', 'travel_limit_end_date',
    'stay_home_start_date', 'stay_home_end_date',
    'educational_fac_start_date', 'educational_fac_end_date',
    'any_gathering_restrict_start_date', 'any_gathering_restrict_end_date',
    'any_business_start_date', 'any_business_end_date',
    'all_non-ess_business_start_date', 'all_non-ess_business_end_date',
])
In [4]:
# We melt the start and end dates and drop the missing data so we can determine
# whether or not a policy was in place using an asof merge to the most recent change
countermeasures = stats.melt(
    id_vars=['location_name'],
    value_vars=[c for c in stats.columns if 'date' in c],
    value_name='date'
).dropna()
countermeasures['type'] = countermeasures['variable'].apply(
    lambda name: '_'.join(name.replace('-', '_').split('_')[:-2])
)
countermeasure_types = sorted(set(countermeasures['type']))
countermeasures['enforced'] = countermeasures['variable'].apply(lambda name: name.split('_')[-2] == 'start')
countermeasures.drop(columns=['variable'], inplace=True)
countermeasures.sort_values('date', inplace=True)
countermeasure_types
Out[4]:
['all_non_ess_business',
 'any_business',
 'any_gathering_restrict',
 'educational_fac',
 'stay_home',
 'travel_limit']
In [5]:
merged = healthdata.copy()

# healthdata uses projections for the past two weeks just as they do for future dates
# as a mitigation for lag in reporting. To look only at estimates for days with data,
# discard anything newer than two weeks before the latest data.
merged.dropna(subset=['confirmed_infections'], inplace=True)
merged = merged.loc[merged['date'] < merged['date'].max() - pd.to_timedelta('2w')]

# Regions that are decomposed are stored here, too, but without countermeasure data.
# Therefore we discard the groups and keep their subdivisions
group_names = ['Brazil', 'Canada', 'Germany', 'Italy', 'Mexico', 'Spain', 'United States of America']
merged = merged.loc[~merged['location_name'].apply(group_names.__contains__)]

# to reduce noise, only investigate growth once a location has at least N infections
N = 200
merged = merged.loc[merged['est_infections_mean'] > N]

# Drop locations with fewer than N samples
enough = merged.groupby('location_name')['V1'].count() >= 40
merged = merged.merge(enough, left_on='location_name', right_index=True)
merged = merged.loc[merged['V1_y']]
In [6]:
merged.sort_values('date', inplace=True)

for countermeasure_type, subset in countermeasures.groupby('type'):
    subset = subset.drop(columns='type').rename(columns={'enforced': countermeasure_type})
    merged = pd.merge_asof(merged, subset, on='date', by='location_name')
    merged[countermeasure_type].fillna(False, inplace=True)

cols = ['est_infections_mean', 'est_infections_lower', 'est_infections_upper']
diffs = merged.groupby('location_name')[cols].diff().rename(columns={c: f'diff_{c}' for c in cols})
merged = pd.concat([merged, diffs], axis=1)

merged['fractional_diff_est_infections_mean'] = merged['diff_est_infections_mean'] / merged['est_infections_mean']

merged['day_of_year'] = merged['date'].dt.dayofyear
merged['day_of_week'] = merged['date'].dt.dayofweek
merged['random'] = np.random.normal(size=len(merged))

The enforcement of each category of countermeasure is hilighted in red. Note the limited prevalence of such measures in the first weeks of data and the relative infrequency of limitations on travel.

In [8]:
plot
Out[8]:

The following plot shows the frequency of each countermeasure where true for enforced countermeasures is up and to the right. Note how there are no samples where any_business is False but all_non_essential_business is True since the latter is simply a stricter level of the former. The same is true of any_gathering_restrict and stay_home.

In [10]:
plot
Out[10]:
In [11]:
x = merged[['fractional_diff_est_infections_mean', 'date',
            'day_of_week', 'location_name',
           ] + countermeasure_types].dropna()
location_indices, location_names = pd.factorize(x['location_name'])

with pm.Model() as model:
    rate = pm.Normal(
        'baseline_rate', mu=0.4, sigma=0.2, shape=len(location_names)
    )[location_indices]
    # Continued in sub-slides
In [12]:
with model:
    # Model the weekly periodicity
    amplitude = pm.HalfNormal(
        'amplitude', sigma=0.4, shape=len(location_names)
    )[location_indices]
    phase0 = pm.Uniform(
        'phase0', 0, 7, shape=len(location_names)
    )[location_indices]
    rate += amplitude * np.cos(
        2 * np.pi *(np.asarray(x['day_of_week']) - phase0) / 7
    )
In [13]:
with model:
    for v in ['educational_fac', 'travel_limit',
              'all_non_ess_business', 'stay_home']:
        rate += pm.Normal(v, mu=0, sigma=0.4) * np.asarray(x[v])
    rate += (
        pm.Normal('any_gathering_restrict', mu=0, sigma=0.4) *
        np.asarray(x['any_gathering_restrict'] & ~x['stay_home'])
    )
    rate += (
        pm.Normal('any_business', mu=0, sigma=0.4) *
        np.asarray(x['any_business'] & ~x['all_non_ess_business'])
    )
In [14]:
with model:
    # Model effect of both stay_home and all_non_ess_business
    rate += (
        pm.Normal('stay_home&all_non_ess_business', mu=0, sigma=0.4) *
        np.asarray(x['stay_home'] & x['all_non_ess_business'])
    )
    pm.Deterministic('rate', rate)
    
    err = pm.HalfNormal('err', sigma=0.4, shape=len(location_names))[location_indices]
    pm.Normal('observations', mu=rate, sigma=err,
              observed=np.asarray(x['fractional_diff_est_infections_mean']))
In [15]:
with model:
    trace = pm.sample(2_000, tune=1_000, chains=4, cores=4)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [err, stay_home&all_non_ess_business, any_business, any_gathering_restrict, stay_home, all_non_ess_business, travel_limit, educational_fac, phase0, amplitude, baseline_rate]
Sampling 4 chains, 0 divergences: 100%|██████████| 12000/12000 [04:42<00:00, 42.43draws/s]
The number of effective samples is smaller than 10% for some parameters.

The following plot shows the relative effectiveness of each countermeasure according to our model. Note that stay_home has a stronger reducing effect than any_gathering_restrict and likewise all_non_ess_business over any_business as expected. Also, the effect of both stay_home and all_non_ess_business is less than the sum of their independent effects by the positive contribution of stay_home&all_non_ess_business. The weak effect of travel_limit is likely due to the fact that travel restrictions are almost never put in place without other effects. Therefore the apparent effect of travel_lmit is only its contribution in combination with other effects.

In [17]:
plot
Out[17]:

To assess the quality of our model, we can look at how its predictions align with the original data.

In [18]:
percentile = np.linspace(0, 100, 1003)[1:-1]
tail_percentile = 100 - abs(percentile * 2 - 100)
    
def plot_rate(location):
    idx = location_names.get_loc(location)
    these = location_indices == idx
    rate = trace['rate'][:, these]
    Y = np.percentile(rate, percentile, axis=0)
    
    t = x.loc[these, 'date']
    df = [{'t': t, 'rate': y, 'P': p} for y, p in zip(Y, tail_percentile)]
    return (
        hv.Contours(df, ['t', 'rate'], 'P')
        .options(cmap='plasma', colorbar=True, show_legend=False, line_width=2, logz=True, cformatter='%.2g%%') *
        hv.Curve(merged.loc[merged['location_name'] == location], 'date', 'fractional_diff_est_infections_mean')
    ).options(aspect=5/3, responsive=True)
In [19]:
plot_rate('Lombardia')
Out[19]:
In [20]:
plot_rate('Veneto')
Out[20]:
In [21]:
plot_rate('Virginia')
Out[21]:
In [22]:
plot_rate('Washington')
Out[22]:
In [23]:
plot_rate('New York')
Out[23]:
In [24]:
plot_rate('France')
Out[24]:
In [25]:
plot_rate('Republic of Korea')
Out[25]:

This analysis has shown that, under some simplifying assumptions we can estimate the effectiveness of these Coronavirus countermeasures in terms of the ability of each to reduce the case growth rate globally. We coarsely rank the effectiveness of each individually as

  1. stay at home order
  2. closing educational facilities
  3. closing all non-essential businesses
  4. any restrictions on businesses
  5. any restrictions on gatherings
  6. limitations on travel

The simplicity of this model is evident in the strong structure in the residuals between the model predictions and the healthdata estimates. It might be possible to overcome some of this by estimating onset rates for compliance with these measures and by cataloging significant events that affect the case growth in any one location.